Software Testing and the Naturally Occurring Data Assumption in Natural Language Processing
نویسندگان
چکیده
It is a widely accepted belief in natural language processing research that naturally occurring data is the best (and perhaps the only appropriate) data for testing text mining systems. This paper compares code coverage using a suite of functional tests and using a large corpus and finds that higher class, line, and branch coverage is achieved with structured tests than with even a very large corpus.
منابع مشابه
Application of Benford’s Law in Analyzing Geotechnical Data
Benford’s law predicts the frequency of the first digit of numbers met in a wide range of naturally occurring phenomena. In data sets, following Benford’s law, numbers are started with a small leading digit more often than those with a large leading digit. This law can be used as a tool for detecting fraud and abnormally in the number sets and any fabricated number sets. This can be used as an ...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملRadiological dose assessment of naturally occurring radioactive materials generated by the petroleum industry in wildlife: A case study of chinkaras of Lavan Island, Iran
Human activities such as oil and gas production can enhance the natural level of naturally occurring radioactive materials (NORM) in by-product and waste streams. Iran has been among the top five oil producing countries since 2005. This high production rate emphasizes the importance of NORM management to ensure the safety of humans and wildlife. Petroleum storage and transport facilities are lo...
متن کاملAssumption Grammars for Knowledge Based Systems 1 Veronica Dahl
In this paper we examine some knowledge base uses of a recently developed logic grammar formalism, Assumption Grammars, particularly suitable for hypothetical reasoning. They are based on intuitionistic and linear implications scoped over the current continuation, which allows us to follow given branches of the computation under hypotheses that disappear when and if backtracking takes place. In...
متن کاملInvestigating Non-Native English Speaking Graduate Students’ Pragmatic Development in Requestive Emails
The present study investigated learners’ interlanguage pragmatic development through analysis of 99 requestive emails addressed to a faculty member over a period of up to two years. Most previous studies mainly investigated how non-native English speaking students’ (NNESs) pragmalinguistic and sociopragmatic competence differed from native English speaking students (NESs) and compared learners ...
متن کامل